Skip to content

Loading optimizations#36742

Merged
Cyrilvallez merged 4 commits intomainfrom
loading-optimizations
Mar 18, 2025
Merged

Loading optimizations#36742
Cyrilvallez merged 4 commits intomainfrom
loading-optimizations

Conversation

@Cyrilvallez
Copy link
Member

@Cyrilvallez Cyrilvallez commented Mar 15, 2025

What does this PR do?

Profiling of CPU memory usage showed that calling gc.collect() has no effect (the del statement is enough and cleaning will happen automatically as it should), i.e. the peak cpu memory usage is never larger than 1 state dict (even with older .bin sharded checkpoints).
Removing the statement speeds up loading by 15-20% as gc.collect() is costly and should in general not be called.

As for the caching allocator, profiling shows that having an allocation_factor > 1 does not help at all neither for TP nor usual loading on multiple GPUs. It is then much easier to always use 1, as we will never encounter cases of blowing GPUs. Also, code was not allocating anything in case allocation size was larger than gpu size, which could lead to big slowdown if reaching this state only because of the factor 2, as the model may still fit.
Also, improved granularity of the allocator by checking each param dtype, which may be different with composite models/keep_in_fp_32_modules.

@github-actions github-actions bot marked this pull request as draft March 15, 2025 14:08
@github-actions
Copy link
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

@Cyrilvallez Cyrilvallez marked this pull request as ready for review March 15, 2025 14:18
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Cyrilvallez Cyrilvallez merged commit db1d4c5 into main Mar 18, 2025
24 checks passed
@Cyrilvallez Cyrilvallez deleted the loading-optimizations branch March 18, 2025 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants